Non-parametric Approximate Dynamic Programming via the Kernel Method

نویسندگان

  • Nikhil Bhat
  • Ciamac C. Moallemi
  • Vivek F. Farias
چکیده

This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Non-Parametric Approach to Dynamic Programming

In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy...

متن کامل

OPTIMIZATION OF A PRODUCTION LOT SIZING PROBLEM WITH QUANTITY DISCOUNT

Dynamic lot sizing problem is one of the significant problem in industrial units and it has been considered by  many researchers. Considering the quantity discount in  purchasing cost is one of the important and practical assumptions in the field of inventory control models and it has been less focused in terms of stochastic version of dynamic lot sizing problem. In  this paper, stochastic dyn...

متن کامل

Boosted Bellman Residual Minimization Handling Expert Demonstrations

This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED). In RLED, the goal is to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a data set of fixed expert demonstrations. This is slightly different from the batch Reinforcement Learning (RL) framework where only fixed sample...

متن کامل

On the Approximation of a Conditional Expectation

In this paper, we discuss how to approximate the conditional expectation of a random variable Y given a random variable X, i.e. E(Y|X). We propose and compare two different non parametric methodologies to approximate E(Y|X). The first approach (namely the OLP method) is based on a suitable approximation of the σ-algebra generated by X. A second procedure is based on the well known kernel non-pa...

متن کامل

Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses

Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012